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INTRODUCTION 


This  project  is  to  explore  an  innovative  CAD  strategy  for  improving  early  detection  of  breast 
cancer  in  screening  mammograms  by  focusing  on  computerized  analysis  and  detection  of  cancers 
missed  by  radiologists.  Due  to  the  unpredictable  difficulty  in  data  collection  in  the  first  year  of 
research,  a  revision  of  the  Statement  of  Work  was  made  and  approved  by  DoD  to  focus  on  the 
important  research  items. 


Objective  1:  to  generate  databases  for  missed  cancer  analysis  and  detection. 
Accomplishments: 


1.  Data  Collection  Criteria  and  Procedure 

a.  The  criteria  for  inclusion  in  this  study  were  as  follows: 

1.  Mass  must  be  visible  on  mammogram 

2.  Mass  must  be  proven  by  biopsy  to  be  malignant 

3.  Mass  must  be  seen  in  retrospect  on  a  prior  mammogram  when  reviewed  by  a  radiologist 

b.  Procedure  used  for  case  selection: 

1.  Lists  of  patients  from  both  the  screening  and  diagnostic  centers  were  obtained 

2.  Each  patient’s  chart  was  reviewed  to  select  for  masses  that  were  visible 
mammographically,  all  others  were  excluded 

3.  The  selected  cases  were  reviewed  for  malignant  pathology  outcome,  all  others  were 
excluded 

4.  Films  were  requested  from  the  diagnostic  center  for  those  cases  with  malignant  masses 

5.  Films  from  the  screening  center  had  to  be  obtained  manually  due  to  lack  of  manpower 

6.  Films  were  reviewed  to  ascertain  whether  the  exam  and  prior  mammograms  were 
available.  Only  those  with  prior  mammograms  were  selected. 

7.  Selected  mammograms  were  reviewed  by  a  radiologist  to  determine  a)  if  the  mass  was 
visible  retrospectively  on  the  prior  exam  and  b)  the  reason  it  was  not  detected  on  the  prior 
exam 

8.  The  radiologist  indicated  the  location  and  outlined  the  contour  of  the  lesion  on  both 
exams  and  the  Breast  Imaging  Reporting  And  Data  System  (BI-RADS)  descriptors 

9.  Ground  truth  files  (hard  copy)  were  generated  based  on  the  radiologists  outlines 

10.  The  films  were  then  digitized  manually  on  a  Kodak  (LUMISYS)  LS85  digitizer  at  a 
resolution  of  50pm  and  12  bits  in  grey  scale. 

2.  Sources  and  number  of  cases  reviewed:  (as  of  March  23,  2004) 


Query  of  patient  databases  770 

Staging  database  93 

Teaching  files  archive  148 

Breast  conference  patients  100 

Log  of  invasive  procedures  160 

Research  archives  63 

Total  number  of  cases  reviewed  1,334 


3.  Reasons  for  exclusion  of  cases  from  the  original  1,334  patients  reviewed: 

Duplication  of  names  among  lists 
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Lesion  was  a  benign  mass 
No  pathology  available 

No  information  available  for  this  patient/exam 

No  follow  up  for  this  patient 

Films  were  unavailable  or  incomplete 

Mass  was  not  visible  on  prior  mammogram  (interval  cancer) 

a.  Analysis  of  the  770  names  from  patient  database  queries: 


Reason  Number  excluded 

Duplication  of  names  among  lists  49 

Lesion  was  something  other  than  a  mass  337 
Lesion  was  a  benign  mass  1 1 1 

No  information  available  5 1 

No  follow  up  available  56 


This  leaves  a  balance  of  166  potential  cases,  of  which: 
Films  were  unavailable  or  incomplete  100 

Mass  not  visible  on  prior  exam  16 

Miscellaneous  exclusions  21 

Usable  cases  29 


b.  Analysis  of  the  93  names  from  the  staging  database: 
Reason  Number  excluded 


Duplication  of  names  among  lists  1 

Lesion  was  something  other  than  a  mass  39 
No  information  available  9 


This  leaves  a  balance  of  44  potential  cases,  of  which: 
Films  were  unavailable  or  incomplete  42 

Usable  cases  2 


c.  Analysis  of  the  148  names  from  teaching  files: 
Reason  Number  excluded 


Duplication  of  names  among  lists  20 

Lesion  was  something  other  than  a  mass  58 
Lesion  was  a  benign  mass  1 2 

No  information  available  13 

No  pathology  available  1 


This  leaves  a  balance  of  44  potential  cases,  of  which: 
Films  were  unavailable  or  incomplete  32 

Mass  not  visible  on  prior  exam  5 


Usable  cases 
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d.  Analysis  of  the  100  names  from  breast  conference  lists: 
Reason  Number  excluded 

Duplication  of  names  among  lists  8 

Lesion  was  something  other  than  a  mass  34 
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Lesion  was  a  benign  ma  1 

No  information  available  12 


This  leaves  a  balance  of  45  potential  cases,  of  which: 

Films  were  unavailable  or  incomplete  29 

Mass  not  visible  on  prior  exam  4 

Usable  cases  12 

e.  Analysis  of  the  160  names  from  invasive  procedures  log: 
Reason  Number  excluded 

Duplication  of  names  among  lists  4 

Lesion  was  something  other  than  a  mass  71 

Lesion  was  a  benign  mass  4 

No  information  available  20 


This  leaves  a  balance  of  61  potential  cases,  of  which: 
Films  were  unavailable  or  incomplete  34 

Mass  not  visible  on  prior  exam  5 

Usable  cases  22 

f  Analysis  of  the  63  names  from  research  archives: 
Reason  Number  excluded 

Duplication  of  names  among  lists  2 

Lesion  was  something  other  than  a  mass  22 

Lesion  was  a  benign  mass  5 

No  pathology  available  9 


This  leaves  a  balance  of  25  potential  cases,  of  which: 
Mass  not  visible  on  prior  exam  1 1 

Usable  cases  14 


Summary:  As  of  March  23,  2004,  a  total  of  86  out  of  1334  cases  were  collected  as  missed 
cancer  cases  for  study.  It  is  projected  that  there  will  be  another  20  cases  be  collected  before  the 
end  of  May  2004,  so  that  the  total  number  of  missed  cancer  cases  will  be  more  than  100. 

4.  Characteristic  analysis  of  the  database 

The  characteristics  of  database  was  analyzed  by  following  descriptions:  (a)  Case  distribution 
in  terms  of  exam  numbers,  (b)  Case  distribution  in  terms  of  cancer  missed  reasons  (per  view  and 
stage),  (c)  Case  distribution  in  terms  of  mass  shape,  (d)  Case  distribution  in  terms  of  mass 
margin,  (e)  Case  distribution  in  terms  of  Mass  density.  The  histograms  are  shown  b  Figure  1. 
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Best  Available  Photos  -  PI  no  longer  with  Organization 


■+■  -O' 

Mass  Density 
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Figure  1.  Case  distribution  in  terms  of  (a)  exam  numbers,  (b)  missed  reasons  (E- 
interpretation  error,  N-not  significant  evidence,  A-absent/no  sign,  F-not  in  field  of  view,  C- 
contrast  problem),  (c)  mass  shape  (O-oval,  X- irregular,  R-round,  L-lobulated,  A- 
architectural  distortion),  (d)  mass  margin  (S-spiculated,  M-microlobulated,  V-obscured,  I- 
indistinct  ill  defined,  D-circumscribed  well  defmed/sharply  defined),  (e)  Mass  density  (=: 
equal/isodense,  +:  high,  low,  0:  fat  containing/radiolucent). 


Objective  2:  to  analyze  the  computerized  features  of  missed  cancers  (false  negatives)  versus 
detected  ones  ( true  positives) 

Accomplishments: 

1.  Data  preprocessing 

There  are  totally  86  cases  of  series  mammograms  in  the  database  now.  Due  to  the 
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difficulty  and  time  coas^n  ing  of  data  collection  as  described  a  bote  and  the  research  timeline 
limitation,  some  preprocessing  and  missed  cancer  analysis  work  had  to  be  taken  in  parallel  with 
data  collection.  In  this  feature  analysis  study,  73  cases  were  processed.  More  and/or  complete 
analysis  will  be  followed.  The  preprocessing  work  for  data  analysis  includes  image  format 
transformation  (from  Digital  Imaging  and  Communications  in  Medicine  (DICOM)  format  to  Sun 
TAAC  Image  File  Format  (VFF)),  image  re-sampling  for  mass  feature  extraction  purpose  (from 
50  (am  to  200  pm). 

2.  Mass  feature  analysis:  missed  vs.  detected 

(1)  ROI  generation:  Based  on  the  mass  location  (center)  indicated  by  radiologist,  two 
sets  of  regions-of- interest  (ROIs)  are  created  with  256x256  pixels  in  size.  One 
contains  a  detected  mass  in  each  ROI,  the  second  set  consists  of  ROIs  with  missed 
masses. 

(2)  Mass  segmentation:  Based  on  the  ground  truth  (mass  contour)  generated  by  an 
experienced  mammographer,  a  manual  segmentation  of  the  mass  was  taken  by 
following  the  outline  interactively  with  a  tool  we  developed  under  Interactive  Data 
Language  (IDL)  environment. 

(3)  Feature  calculation:  Following  features  are  designed  and  calculated  on  both  detected 
and  missed  masses  using  the  original  ROI  image  and  the  segmented  image  [1]: 
Gray-level  features:  Intensity  Mean,  Intensity  Variance,  Intensity  difference  between 
mass  area  and  surrounding  background  area; 

Morphological  features:  Size,  Circularity,  Compactness,  Roughness,  Fluctuation, 
FWHM  (Full- Width  Half-Maximum),  Radial  gradient; 

Texture  features:  Generalized  Co-occurrence  Matrix  (GCM)  based  features  (Energy, 
Difference  moment.  Inverse  difference  moment,  Correlation),  Laws  features. 

(4)  Statistical  analysis:  To  explore  the  difference  of  detected  and  missed  cancer 
features,  a  set  of  tests  was  applied  to  the  extracted  features  individually.  Listed  in 
Table  1  are  the  /3-values  of  three  tests  including  normality  test,  paired  t-test,  and 
signed  rank  test  for  each  feature  [2],  In  order  to  explore  the  potential  effect  of 
mammography  exam  view  on  interpretation  and  the  difference  of  missed  cancer 
features  on  different  views,  in  addition  to  the  Craniocaudal  (CC)  and  Mediolateral 
Oblique  (MLO)  combined  test,  statistical  tests  on  CC  view  only  and  MLO  view  only 
were  also  taken.  Following  is  the  interpretation  of  test  results: 

■  If  normality  p- value  is  less  than  0.05,  we  say  the  difference  between  miss  and 
detection  of  certain  feature  is  not  normally  distributed. 

■  If  the  difference  between  miss  and  detection  of  certain  feature  is  normally 
distributed,  we  use  paired  t-test.  If  t-test  P-value  is  less  than  0.05,  we  have 
evidence  to  reject  null  hypothesis  that  the  mean  of  difference  is  zero  at 
significant  level  0.05.  (significantly  different) 

■  If  the  difference  between  miss  and  detection  of  certain  variable  is  not  normally 
distributed,  we  use  signed  rank  test.  If  signed  rank  test  P-value  is  less  than  0.05, 
we  have  evidence  to  reject  null  hypothesis  that  the  mean  of  difference  is  zero  at 
significant  level  0.05.  (significantly  different) 

■  From  the  table,  the  most  significantly  changed  features  are  size,  intensity 
variance,  intensity  difference,  compactness,  correlations,  difference  entropy, 
and  inverse  difference  moments. 

For  illustrative  purpose,  box-plots  of  four  features  are  shown  in  Figure  2.  It  is 
observed  that  the  features  of  Compactness  and  Correlation  2  (at  45  degree)  have  a 
significant  difference  between  the  detected  and  missed  masses,  while  there  are  not 
statistical  difference  in  terms  of  Laws  Feature  8  and  intensity  Mean. 
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Figure  2.  Box-plots  for  the  illustration  of  statistical  tests  of  the  difference  of  four 
computerized  features  between  missed  and  detected  cancers. 
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3.  Breast  density  analysis 

(1)  The  breast  area  in  a  mammogram  is  segmented  from  the  surrounding  background. 
The  chest  wall  is  removed  by  manual  segmentation.  Based  on  the  characteristic 
features  of  the  gray  level  histogram  of  breasts  at  different  intensity  level,  a  gray  level 
threshold  value  for  each  image  is  determined  by  interactive  method  to  segment  the 
dense  area  from  the  breast.  Four  classes  can  be  classified  according  to  a  gray  level 
histogram  of  the  breast  area.  A  typical  Class  1  is  almost  entirely  fat,  it  has  a  single 
narrow  peak  on  the  histogram.  Class  II  has  scattered  fibroglandular  densities.  It  has 
two  peaks.  The  smaller  peak  is  on  the  right  of  the  bigger  one.  Class  III  is 
heterogeneously  dense.  It  has  two  peaks,  but  the  smaller  peak  is  on  the  left  of  the 
bigger  one.  Class  IV  is  extremely  dense,  which  has  a  single  dominant  peak  on  the 
histogram,  but  it  is  wider  compared  with  the  peak  in  the  Class  I  histogram. 

(2)  The  area  of  segmented  dense  tissue  as  a  percentage  of  the  breast  area  is  then 
calculated  as  the  index  of  breast  density. 

(3)  A  preliminary  study  was  taken  to  analyze  the  breast  density  feature  of  missed  cancer 
cases  versus  detected  cases.  The  p-values  of  statistical  test  are  listed  in  Table  1 . 

4.  Temporal  Analysis 

Temporal  analysis  was  taken  to  explore  the  difference  of  characteristics  between  the 
changes  of  features  among  normal  region,  missed  cancer  region  and  detected  cancer  region. 
Following  features  of  each  ROI  are  calculated  [1]:  (1)  Intensity  Mean,  (2)  Intensity  Variance,  (3) 
Energy,  (4)  Difference  Moment,  (5)  Inverse  Difference  Moment,  (6)  Correlation,  and  (7)  14 
Laws  features.  Listed  in  Table  1  are  the  p-values  of  three  tests  including  normality  test,  paired  t- 
test,  and  signed  rank  test  for  each  feature  [2], 


Table  1.  P-Value  Table:  Missed  vs.  Detected 


FEATURE  NAME 

VIEW 

NORMALITY 

PAIRED  T  TEST 

SIGNED  RANK 
TEST 

Size 

CC  &  MLO 

<0.0001 

<0.0001 

<0.0001 

cc 

0.0017 

<0.0001 

<0.0001 

MLO 

<0.0001 

<0.0001 

<0.0001 

Intensity  Mean 

CC  &  MLO 

0.3901 

0.0901 

0.1206 

CC 

0.3430 

0.1864 

0.2675 

MLO 

0.9198 

0.2961 

0.3102 

Intensity  Variance 

CC  &  MLO 

<0.0001 

<0.0001 

<0.0001 

CC 

0.9714 

<0.0001 

<0.0001 

MLO 

<0.0001 

<0.0001 

<0.0001 

Intensity  Difference 

CC  &  MLO 

0.0020 

<0.0001 

<0.0001 

CC 

0.0039 

<0.0001 

<0.0001 

MLO 

0.2125 

<0.0001 

<0.0001 

Circularity 

CC  &  MLO 

0.0058 

0.2910 

0.3514 

CC 

0.2054 

0.8544 

0.9941 

MLO 

0.0035 

0.1815 

0.1485 

Compactness 

CC  &  MLO 

0.0002 

0.0002 

0.0006 

CC 

0.0033 

0.0026 

0.0046 

MLO 

0.0056 

0.0239 

0.0435 

Roughness 

CC  &  MLO 

0.9990 

0.7341 

0.7418 

CC 

0.8514 

0.8370 

0.7942 

MLO 

0.9171 

0.7785 

0.8501 
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Radial  Gradient 


Energy  1  (0°) 


Energy  2  (45°) 


Energy  3  (90°) 


Energy  4  (135°) 


CC  &  MLO 


CC 

MLO 


CC  &  MLO 


CC 


MLO 


CC  &  MLO 


CC 


MLO 


CC  &  MLO 
CC 
MLO 


<0.0001 


<0.0001 


0.0002 


<0.0001 


5 


0.6619 


0.6952 


0.8280 


Difference  Moment  3 
(90°) 

Difference  Moment  4 


Inverse  Difference 
Moment  1  (0°) 


I 

I 


CC  &  MLO 
CC 
MLO 

CC  &  MLO 
CC 


CC  &  MLO 
CC 
MLO 


Inverse  Difference 

CC  &  MLO 

Moment  2  (45°) 

CC 

MLO 

Inverse  Difference 

CC  &  MLO 

Moment  3  (90°) 

CC 

MLO 

Inverse  Difference 

CC  &  MLO 

Moment  4  (135°) 

CC 

0.0010 
<0.0001 
0 


<0.0001 


0.5219 

0.9513 


0.1456 


0.0132 


0402 


0.0916 


Correlation  1  (0°) 


Correlation  2  (45°) 


Correlation  3  (90°) 


O 


CC  &  MLO 
CC 


<0.0001 


0.0029 


0.0490 


0.0272 


<0.0001 


0.0134 


CC  &  MLO 
CC 
MLO 


CC  &  MLO 


CC 


MLO 


<0.0001 


<0.0001 


<0.0001 


<0.0001 


<0.0001 


<0.0001 


0.0006 


<0.0001 


<0.0001 


0.0152 


<0.0001 


0.9652 


0.5762 


0.6120 


0.6655 

0.1706 

0.5049 

0.0151 

0.4790 

0.0733 

0.7580 

0.1075 

0.0006 

0.0002 

0.0004 


0.0154 


0.0135 


<0.0001 


<0.0001 


<0.0001 


<0.0001 


<0.0001 
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CC 
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0.0001 


<0.0001 
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<0.0001 


<0.0001 

<0.0001 
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0.0028 
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0.0010 
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<0.0001 


<0.0001 
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0.1356 
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0.0574 
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0.1446 
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0.0245 


0.1294 


0.1487 
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0.0989 


0.0630 


0.5366 


0.0073 
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0.2963 
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Tablfc  2  Temporal  Comparisoi 


FEATURE  NAME 

Normality  Paired  T-Test 

Signed  Rank  Test 

Intensity  Mean 

0.8584 

0.0099 

0.0069 

Intensity  Variance 

0.1426 

0.4962 

0.3167 

_ gpwgy  Mo0) _ 

0.9759 

0.9445 

0.8176 

Energy  2  (45°) 

0.9510 

0.9592 

0.8332 

Energy  3  (90°) 

0.9791 

0.9562 

0.8176 

Energy  4  (135°) 

0.9808 

0.9378 

0.8020 

Difference  Moment  1  (0°) 

0.9001 

0.4837 

0.5001 

Difference  Moment  2(45°) 

0.3719 

0.6939 

0.6806 

Difference  Moment  3  (90°) 

0.9847 

0.3220 

0.3799 

Difference  Moment  4(135°) 

<0.0001 

0.3010 

0.6513 

Inverse  Difference  Moment  1  (0°) 

0.9352 

0.5495 

0.6083 

Inverse  Difference  Moment  2  (45°) 

0.8829 

0.8537 

0.9441 

Inverse  Difference  Moment  3  (90°) 

0.8287 

0.4730 

0.4622 

Inverse  Difference  Moment  4  (135°) 

0.7900 

0.4166 

0.4378 

Correlation  1  (0°) 

<0.0001 

0.2298 

0.1328 

Correlation  2  (45°) 

<0.0001 

0.2983 

0.1274 

Correlation  3  (90°) 

0.0051 

0.3962 

0.2050 

Correlation  4  (135°) 

<0.0001 

0.1911 

0.1383 

Laws  Feature  1 

<0.0001 

0.3688 

0.2075 

Laws  Feature  2 

0.0107 

0.0557 

0.0152 

Laws  Feature  3 

0.0007 

0.1023 

0.0196 

Laws  Feature  4 

0.0443 

0.0350 

0.0140 

Laws  Feature  5 

<0.0001 

0.7859 

0.0886 

Laws  Feature  6 

<0.0001 

0.1694 

0.5749 

Laws  Feature  7 

0.0037 

0.0171 

0.0067 

Laws  Feature  8 

0.0008 

0.0346 

0.0151 

Laws  Feature  9 

<0.0001 

0.0753 

0.0067 

Laws  Feature  1 0 

0.0011 

0.3924 

0.0554 

Laws  Feature  1 1 

0.2971 

0.0058 

0.0067 

Laws  Feature  1 2 

<0.0001 

0.3370 

0.0215 

Laws  Feature  13 

<0.0001 

0.0952 

0.0067 

Laws  Feature  14 

0.2214 

0.0033 

0.0015 

Objective  3:  to  determine  the  effect  of  density  pattern  on  cancers  detection 
Accomplishments: 

(1)  Segmentation  of  glandular  regions  in  mammogram 

An  automatic  approach  was  applied  in  mammographic  dense  tissue  segmentation.  It  is  a 
statistical-based  method  developed  in  our  lab  f  1 J.  The  segmentations  were  taken  on  both 
cancerous  and  normal  mammograms  at  screening-detected  and  screening-missed  stages 
respectively.  The  percentage  of  segmented  density  tissue  area  out  of  the  whole  breast  area  is 
calculated  as  the  index  of  breast  density.  Figure  1  shows  the  histograms  of  density  index  of  three 
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different  type  mamm<?ytems.  To  check  the  correlation  of  deii'ty  between  mammograms  at 
missed  and  detected  stages,  two  kinds  of  correlation  analysis,  i.e.  Pearson’s  correlation  and 
Spearman’s  Rank  correlation,  were  taken  [2].  The  Pearson  correlation  coefficient  measures  the 
strength  and  direction  of  a  linear  relationship  between  two  variables.  One  problem  is  that  if 
there  are  outliers  in  the  data,  Pearson's  correlation  coefficient  will  be  greatly  affected.  Also, 
Pearson's  correlation  coefficient  only  measures  linear  relationships  between  variables. 
Spearman’s  rank  correlation  coefficient  is  a  nonparametric  (distribution-free)  rank  statistic  which 
is  a  measure  of  strength  of  the  associations  between  two  variables.  As  this  measure  depends 
only  on  ranks  it  is  not  affected  by  outliers.  The  correlation  coefficients  are  listed  in  Table  1.  It  is 
observed  that  (i)  there  is  a  good  consistency  between  the  Pearson’s  correlation  and  Spearman’s 
Rank  correlation,  i.e.  no  significant  outliers  exist  in  density  segmentation;  (ii)  the  breast  density 
segmented  at  missed  stage  is  correlated  to  that  at  detected  stage;  (iii)  the  segmentation 
correlation  between  normal  mammograms  at  missed  and  detected  stages  is  higher  than  that  with 
cancerous  mammograms.  An  explanation  is  that  the  cancerous  mammogram  usually  has  more 
complicated  density  pattern  and  is  statistically  of  higher  density  as  shown  below,  which  makes 
big  variations  in  segmentation. 


Table  1.  Correlation  of  Density  Segmentation. 


Variable  1 

Variable  2 

Pearson’s 

correlation 

coefficients 

Spearman’s 

correlation 

coefficients 

Missed_cancer 

Detected_cancer 

0.5896 

0.5946 

Missed_normal 

Detected_normal 

0.6908 

0.6882 

Ml#*«(j-Cancer 

.  Detected -C  a  near 
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Figure  1.  Histograms  of  breast  density:  (a)  cancerous  mammogram  at  missed  stage;  (b)  cancerous  mammogram  at 
detected  stage;  (c)  normal  mammogram  at  missed  stage;  (d)  normal  mammogram  at  detected  stage. 

(2)  Density  analysis  of  normal  and  cancerous  mammograms 

A  set  of  statistical  testing  was  taken  to  exam  (i)  Is  there  any  difference  in  density  between  the 
mammograms  at  the  detected  stage  and  that  at  missed  stage?  (ii)  Is  there  any  difference  in 
density  between  the  normal  mammograms  and  the  cancerous  mammograms?  Listed  in  Table  2 
are  the  p-values  of  T-test  and  Wilcoxon  rank  test  for  density  difference  between  detected  stage 
mammogram  and  missed  stage  mammogram,  and  the  normal  mammogram  and  cancerous 
mammogram  respectively.  If  the  difference  of  density  index  is  normally  distributed,  we  use  t-test 
otherwise  use  Wilcoxon  rank  test.  If  the  test  p-value  is  less  than  0.05,  we  have  evidence  to  reject 
null  hypothesis  that  the  mean  of  difference  is  zero  at  significant  level  0.05,  i.e.  significantly 
different  [2],  It  is  observed  that  (i)  there  is  no  significant  change  in  density  of  mammograms  at 
detected  and  missed  stages  for  both  the  normal  and  cancerous  mammograms.  It  is  because  most 
of  the  mammograms  at  missed  and  detected  stages  were  taken  in  consecutive  years  as  shown  in 
Figure  2,  during  which  no  significant  change  could  have  happened  on  breast,  (ii)  There  is  a 
significant  difference  in  density  between  normal  and  cancerous  mammograms  at  both  detected 
and  missed  stages.  Specifically  the  cancerous  mammograms  have  a  higher  density  than  normal 
mammograms. 


Table  2.  Statistical  Test  of  Density  Difference 


Variable  1 

Variable  2 

T-  test 
p-value 

Wilcoxon  test 
p-value 

Missed_cancer 

Detect  ed_cancer 

0.4793 

0.5919 

Missed_normal 

Detected_normal 

0.6708 

0.5326 

Missed_cancer 

Missed  _normal 

5.977e-07 

3.339e-06 

Detected_cancer 

Detected_normal 

2.579e-06 

5.067e-06 
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Figure  2.  A  distribution  of  interval  between  mammograms  taken  at  missed  and  detected  stages. 

(3)  Effect  of  density  pattern  on  CAD  detection  performance 

In  the  study  described  above,  we  have  demonstrated  the  statistical  difference  in  breast  density 
between  the  normal  and  cancerous  mammograms.  It  has  also  been  reported  that  the  lesions 
occurred  in  dense  breasts  are  statistically  more  likely  to  be  missed  in  screening  mammogram  [3]. 
However  there  is  no  report  on  study  of  the  effect  of  density  on  CAD  detection  performance.  In 
this  research,  as  a  baseline  study,  we  used  our  existing  CAD  algorithm  for  detection  testing  on 
the  serial  database  with  an  intention  to  examine  the  differences  in  detection  performance  for 
cases  with  different  breast  density.  The  detailed  technical  information  on  the  CAD  algorithm  can 
be  found  in  [4][5].  Due  to  the  limited  size  of  database,  the  mammograms  were  classified  into  two 
categories  corresponding  to  density  percentages  of  less  or  more  than  25%.  Figure  2  and  3  show 
the  FROC  curves  of  CAD  detection  results  of  high  (>25%)  and  low  (<25%)  density  cases  at 
missed  and  detected  stages  respectively.  It  is  observed  that  (i)  the  detection  performance  on  less 
dense  case  is  better  than  that  on  high  dense  cases.  In  other  words,  similar  to  the  radiologists  in 
mammogram  screening,  the  lesions  occurred  in  dense  breasts  are  more  likely  to  be  missed  in 
CAD  detection;  (ii)  the  difference  of  detection  performance  between  high  and  low  dense  cases  is 
smaller  at  the  detection  stage  than  that  at  missed  stage,  i.e.  the  lesions  on  dense  mammograms 
are  even  more  difficult  to  detect  compared  to  the  lesions  on  low  dense  mammograms  at  the 
missed  stage. 
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Figure  3.  FROC  curves  of  CAD  cancer  detection  on  mammograms  at  screening  detected  stage. 


Detection  Performance  at  Missed  Stage 
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Figure  4.  FROC  curves  of  CAD  cancer  detection  on  mammograms  at  screening  missed  stage. 


Objective  4:  to  design  new  CAD  system  for  improving  missed  cancer  detection 
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The  new  CAD  system  is  based  on  our  two  generations  of  CAD  algorithms  for  mass  detection 
using  digitized  mammogram  [4][5J  and  incorporates  the  analysis  results  of  missed  cancer  in  the 
design.  The  strategies  taken  in  this  study  include  (a)  Multi-mode  detection  by  breast  density 
classification:  It  has  been  demonstrated  in  the  baseline  testing  study  by  using  existing  CAD 
algorithm  that  the  lesions  occurred  in  dense  breasts  are  more  likely  to  be  missed  in  CAD 
detection.  Therefore,  in  order  to  improve  the  detection  of  missed  cancer,  a  multi-mode  detection 
was  performed  by  classifying  the  mammogram  with  breast  density  index  as  defined  above  before 
an  appropriate  detection  mode  is  applied  to  the  detection.  Due  to  the  limit  size  of  database  in  this 
study,  each  input  mammogram  was  classified  into  two  categories  corresponding  to  density 
percentages  of  <25%  and  >25%.  (b)  Breast  area  partition  and  region  based  adaptive  detection: 
Due  to  the  fact  that  the  location  of  cancer  appearance  in  mammograms  has  a  big  variation  in 
missing  probability  in  screening  mammogram,  breast  area  partition  provides  the  basis  for  further 
adaptive  processing.  The  partition  process  consists  of  three  steps:  (i)  breast  boundary  and  nipple 
detection;  (ii)  pectoral  muscle  and  view  (CC  or  MLO)  identification;  (iii)  area  partition.  Figure  5 
shows  the  likelihood  of  missed  cancers  in  each  region,  (c)  Weighted  classification  using  the 
distinguishing  features  identified  in  missed  cancer  analysis:  The  classification  is  a  modified 
hybrid  structure  in  which  (i)  a  combined  "hard"  and  "soft"  decision  classification  strategy  was 
applied  [4][5J;  (ii)  decision  thresholds  were  adjusted  based  on  the  missed  cancer  feature  analysis. 
For  example,  a  significant  difference  in  feature  “mass  size”  was  observed  between  detected  and 
missed  stages,  therefore  the  threshold  for  this  feature  in  decision  tree  was  reduced  in  order  to 
enhance  the  chance  of  missed  cancer  to  be  detected;  (iii)  candidate  competition  are  weighted 
using  region  likelihood  value.  Figure  7-11  show  the  FROC  curves  of  detection  on  mammograms 
of  missed  and  detected  cancer  stages.  It  is  observed  that  the  new  CAD  system  provides  a  better 
detection  performance  at  both  missed  and  detected  stages.  However,  because  the  new  CAD  is 
designed  with  focus  on  missed  cancer,  a  bigger  improvement  is  obtained  for  missed  cancer 
detection. 


■  CC  View  I 


Location 


(a) 
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(b) 

Figure  5.  Distribution  of  cancers  at  different  locations  on  (a)  CC  view  and  (b)  MLO  view,  where  SA=Subareol, 
C=Central,  CL=Lower-Central,  CU=Upper-Central,  RC=Central-Retroglandular,  RU=Upper-RetroglanduIar, 
RL=Lower-Retr ogl andular ,  L=Lateral,  CL=Central-Lateral,  CM=Medial-Central,  RM=Medial-Retroglandular, 
RL=Lateral-Retroglandular. 
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Figure  6.  Improvement  of  CAD  cancer  detection  on  mammograms  at  screening  missed  stage. 
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Figure  7.  Improvement  of  CAD  cancer  detection  on  mammograms  at  screening  detected  stage. 


Objective  5:  to  measure  the  stand-alone  detection  sensitivity/specificity  of  the  new  CAD  system. 
Accomplishments: 


A  new  CAD  system  was  designed  in  the  second  year  research  based  on  our  two  generations  of 
CAD  algorithms  for  mass  detection  using  digitized  mammogram  [1][2]  and  the  analysis  results 
of  missed  cancer  in  this  project.  The  strategies  taken  in  this  new  CAD  system  design  include  (a) 
multi-mode  detection  by  breast  density  classification;  (b)  breast  area  partition  and  region  based 
adaptive  detection;  (c)  weighted  classification  using  the  distinguishing  features  identified  in 
missed  cancer  analysis. 

In  order  to  evaluate  the  new  CAD  strategy,  it  is  important  to  test  the  sensitivity/specificity  and  its 
early  cancer  detection  performance  of  a  radiologist  who  is  assisted  by  the  CAD  system  and 
compare  it  with  the  performance  of  single  and  double  reading.  For  this  purpose,  large  costly 
trials  with  several  radiologists  and  a  large  number  of  normal  cases  are  required  to  represent  the 
screening  situation.  Due  to  the  time  and  budget  limit  of  this  project,  before  starting  such  trials,  it 
is  important  to  measure  the  stand-alone  performance  of  the  new  CAD  system  and  compare  it 
with  existing  CAD  [1], 

Figure  1  and  2  show  a  comparison  of  the  FROC  curves  of  overall  detection  performance  by  new 
and  conventional  CAD  systems  on  mammograms  at  missed  and  detected  cancer  stages.  More 
detailed  comparisons  of  detection  on  mammograms  with  low  (<25%)  and  high  (>25%)  density  at 
missed  and  detected  stages  are  shown  in  Figure  3-6.  It  is  observed  that  the  new  CAD  system 
provides  a  better  detection  performance  at  both  missed  and  detected  stages.  However,  because 
the  new  CAD  is  designed  with  focus  on  missed  cancer,  a  bigger  improvement  is  obtained  for 
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Figure  5.  Improvement  of  CAD  cancer  detection  on  low  dense  mammograms  at  screening  detected  stage. 
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Figure  6.  Improvement  of  CAD  cancer  detection  on  high  dense  mammograms  at  screening  delected  stage. 


Objective  6:  evaluation  of  early  detection 
Accomplishments: 


An  evaluation  of  early  detection  by  using  CAD  system  was  taken.  The  earliness  of  detection  is 
measured  in  terms  of  the  number  of  months  that  the  cancer  is  detected  by  CAD  before  it  is 
detected  by  radiologist  without  assistance  of  CAD  system.  Here  it  is  assumed  that  all  true 
positive  detections  were  accepted  by  radiologist  and  there  were  no  negative  effects  of  CAD  false 
positives  on  decision  making  in  diagnosis. 

Figure  7  and  8  present  the  number  of  months  of  early  detection  at  different  false  positive  rates 
with  existing  and  new  CAD  systems  respectively.  It  is  observed  that  the  cancer  could  be  detected 
earlier  with  less  false  positive  signals  by  using  the  new  CAD  strategy. 


25 


Months 


1.46  2.51  3.06  3.57  4.12 

False  Positive 

Figure  7.  Early  detection  of  cancers  with  existing  CAD  system 
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Figure  8.  Early  detection  of  cancers  with  new  CAD  system 
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RESEARCH  ACCOMPLISHMENTS 


1.  A  database  of  mammogram  was  generated  containing  86  cases  of  serial  mammograms, 
which  were  selected  by  reviewing  1334  cases.  Based  on  this  database,  we  further 
generated  three  datasets,  i.e.  missed  cancer  dataset,  detected  cancer  dataset  and  normal 
dataset. 

2.  A  series  of  statistical  analyses  of  the  computerized  features  of  missed  cancers  (false 
negatives)  versus  detected  ones  (true  positives)  and  their  interval  changes  was  taken. 
Based  on  the  test  P-values,  the  features  with  significant  impact  on  radiologist’s  diagnosis 
and  that  potentially  be  useful  for  early  detection  could  be  identified. 

3.  A  comprehensive  analysis  was  taken  on  the  effect  of  breast  density  on  cancer  detection. 
The  accomplishments  include  breast  dense  tissue  segmentation,  correlation  analysis  of 
mammogram  density  features  between  missed  and  detected  stages,  statistical  testing  of 
density  difference  between  normal  and  cancerous  mammograms,  baseline  study  of  the 
effect  of  density  on  CAD  detection  performance  using  existing  algorithm. 

4.  A  new  CAD  system  was  designed  based  on  the  existing  second-generation  CAD 
algorithm  and  the  missed  cancer  analysis.  Due  to  the  effective  modification  strategies 
taken  in  the  new  system,  detection  performance  was  improved  for  mammograms  at  both 
detected  and  missed  stages.  However,  with  the  focus  on  missed  cancer  analysis  and 
detection,  a  bigger  improvement  was  obtained  in  detecting  missed  cases  even  though  the 
general  detection  performance  is  still  lower  than  that  at  detected  stage. 

5.  An  evaluation  was  taken  to  measure  the  stand-alone  detection  sensitivity/specificity  of 
the  new  CAD  system.  Comparisons  of  overall  detection  performance  and  the  detection  on 
mammograms  with  low  (<25%)  and  high  (>25%)  density  at  missed  and  detected  stages 
were  performed.  It  is  observed  that,  due  to  the  effective  modification  strategies  taken  in 
the  new  system,  detection  performance  was  improved  for  mammograms  at  both  detected 
and  missed  stages.  Because  the  new  CAD  system  is  designed  with  the  focus  on  missed 
cancer  analysis  and  detection,  a  bigger  improvement  was  obtained  in  detecting  missed 
cases  even  though  the  general  detection  performance  is  still  lower  than  that  at  detected 
stage.  The  improvements  on  low  and  high  dense  mammograms  are  comparable. 

6.  An  evaluation  of  early  detection  by  using  CAD  system  was  taken.  It  is  observed  that  the 
cancer  could  be  detected  earlier  with  less  false  positive  signals  by  using  the  new  CAD 
strategy. 
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(a)  Y.  Qiu,  L.  Li,  D.  Goldgof,  R.A.  Clark,  ‘Three  dimensional  deformation  model  for  lesion 
correspondence  in  breast  imaging,”  Proceedings  of  SPIE  Medical  Imaging,  2003. 
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